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Low bit-rate audio encoding 



The present invention relates to encoding and decoding of broadband signals 
such as particular audio signals. 

When transmitting broadband signals, e.g. audio signals such as speech, 
compression or encoding techniques are used to reduce the bandwidth or bit rate of the 
5 signal 

Fig. 1 shows a known parametric encoding scheme, in particular a sinusoidal 
encoder, which is used in the present invention, and which is described in WO 01/69593. In 
this encoder, an input audio signal x(t) is split into several (possibly overlapping) time 
segments or frames, typically of duration 20 ms each. Each segment is decomposed into 

10 transient, sinusoidal and noise components. It is also possible to derive other components of 
the input audio signal such as harmonic complexes, although these are not relevant for the 
purposes of the present invention. 

In the sinusoidal analyzer 130, the signal x2 for each segment is modeled 
using a number of sinusoids represented by amplitude, frequency and phase parameters. This 

15 information is usually extracted for an analysis time interval by performing a Fourier 
transform (FT) which provides a spectral representation of the interval including: 
frequencies, amplitudes for each frequency, and phases for each frequency, where each phase 
is "wrapped", i.e. in the range {-7t;n}. Once the sinusoidal information for a segment is 
estimated, a tracking algorithm is initiated. This algorithm uses a cost function to link 

20 sinusoids in different segments with each other on a segment-to-segment basis to obtain so- 
called tracks. The tracking algorithm thus results in sinusoidal codes Cs comprising 
sinusoidal tracks that start at a specific time instance, evolve for a certain duration of time 
over a plurality of time segments and then stop. 

In such sinusoidal encoding, it is usual to transmit frequency information for 

25 the tracks formed in the encoder. This can be done in a simple manner and with relatively 
low costs, since tracks only have slowly varying frequency. Frequency information can 
therefore be transmitted efficiently by time differential encoding. In general, amplitude can 
also be encoded differentially over time. 
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In contrast to frequency, phase changes more rapidly with time. If the 
frequency is constant, the phase will change linearly with time, and frequency changes will 
result in corresponding phase deviations from the linear course. As a function of the track 
segment index, phase will have an approximately linear behavior. Transmission of encoded 
5 phase is therefore more complicated. However, when transmitted, phase is limited to the 
range {-%;n} , i.e. the phase is "wrapped", as provided by the Fourier transform. Because of 
this modulo 2n representation of phase, the structural inter-frame relation of the phase is lost 
and, at first sight appears to be a random variable. 

However, since the phase is the integral of the frequency, the phase is 

10 redundant and needs, in principle, not be transmitted. This is called phase continuation and 
reduces the bit rate significantly. 

In phase continuation, only the first sinusoid of each track is transmitted in 
order to save bit rate. Each subsequent phase is calculated from the initial phase and 
frequencies of the track. Since the frequencies are quantized and not always very accurately 

1 5 estimated, the continuous phase will deviate from the measured phase. Experiments show 
that phase continuation degrades the quality of an audio signal. 

Transmitting the phase for every sinusoid increases the quality of the decoded 
signal at the receiver end, but it also results in a significant increase in bit rate/bandwidth. 
Therefore, a joint frequency/phase quantizer, in which the measured phases of a sinusoidal 

20 track having values between -n and n are unwrapped using the measured frequencies and 
linking information, results in monotonically increasing unwrapped phases along a track. In 
that encoder the unwrapped phases are quantized using an Adaptive Differential Pulse Code 
Modulation (ADPCM) quantized and transmitted to the decoder. The decoder derives the 
frequencies and the phases of a sinusoidal track from the unwrapped phase trajectory. 

25 I» phase continuation, only the encoded frequency is transmitted, and the 

phase is recovered at the decoder from the frequency data by exploiting the integral relation 
between phase and frequency. It is known, however, that when phase continuation is used, 
the phase cannot be perfectly recovered. If frequency errors occur, e.g. due to measurement 
errors in the frequency or due to quantization noise, the phase, being reconstructed using the 

30 integral relation, will typically show an error having the character of drift. This is because 
frequency errors have an approximately random character. Low-frequency errors are 
amplified by integration, and consequently the recovered phase will tend to drift away from 
the actually measured phase. This leads to audible artifacts. 
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This is illustrated in Fig. 2a where Q. and y are the real frequency and real 
phase, respectively, for a track. In hoth the encoder and decoder frequency and phase have an 
integral relationship as represented by the letter "I". The quantization process in the encoder 
is modeled as an added noise n. In the decoder, the recovered phase \j? thus includes two 
5 components: the real phase ^ and a noise component e 2 , where hoth the spectrum of the 
recovered phase and the power spectral density function of the noise e 2 have a pronounced 
low-frequency character. 

Thus, it can he seen that in phase continuation, since the recovered phase is the 
integral of a low-frequency signal, the recovered phase is a low-frequency signal itself. 

10 However, the noise introduced in the reconstruction process is also dominant in this low- 
frequency range. It is therefore difficult to separate these sources with a view to filtering the 
noise n introduced during encoding. 

In conventional quantization methods, frequency and phase are quantized 
independent of each other. In general, a uniform scalar quantized is applied to me phase 

15 parameter. For perceptual reasons the lower frequencies should he quantized more accurately 
than the higher frequencies. Therefore the frequencies are converted to a non-uniform 
representation using the ERB or Bark function and then quantized uniformly, resulting in a 
non-uniform quantized. Also physical reasons can be found: in harmonic complexes, higher 
harmonic frequencies tend to have higher frequency variations than the lower frequencies. 

20 When the frequency and phase are quantized jointly, frequency dependent 

quantization accuracy is not straightforward. The use of a uniform quantization approach 
results in a low quality sound reconstruction. Furthermore, for the high frequencies, where 
the quantization accuracy can be lowered, a quantized can be developed that needs less bits. 
For the unwrapped phases, a similar mechanism would be desirable. 

25 The invention provides a method of encoding a broadband signal, in particular 

an audio signal such as a speech signal using a low bit-rate. In the sinusoidal encoder a 
number of sinusoids are estimated per audio segment A sinusoid is represented by 
frequency, amplitude and phase. Normally, phase is quantized independent of frequency. The 
invention uses a frequency dependent quantization of phase, and in particular the low 

30 frequencies are quantized using smaller quantization intervals than at higher frequencies. 

Thus, the unwrapped phases of the lower frequencies are quantized more accurately, possibly 
with a smaller quantization range, than the phases of the higher frequencies. The invention 
gives a significant improvement in decoded signal quality, especially for low bit-rate 
quantizers. 
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The invention enables the use of joint quantization of frequency and phase 
while having a non-uniform frequency quantization as well. This results in the advantage of 
transmitting phase information with a low hit rate while still maintaining good phase 
accuracy and signal quality at all frequencies, in particular also at low frequencies. 
5 The advantage of this method is improved phase accuracy, in particular at the 

lower frequencies, where a phase error corresponds to a larger time error than at higher 
frequencies. This is important, since the human ear is not only sensitive to frequency and 
phase hut also to absolute timing as in transients, and the method of the invention results in 
improved sound quality, especially when only a small number of bits is used for quantizing 
10 the phase and frequency values. On the other hand, a required sound quality can be obtained 
using fewer bits. Since the low frequencies are slowly varying, the quantization range can be 
more limited and a more accurate quantization is obtained. Furthermore, the adaptation to a 
finer quantization is much fester. 

The invention can be used in an audio encoder where sinusoids are used. The 
15 invention relates both to the encoder and the decoder. 



Fig. 1 shows a prior art audio encoder in which an embodiment of the 
invention is implemented; 
20 Fig. 2a illustrates the relationship between phase and frequency in prior art 

systems; 

Fig. 2b illustrates the relationship between phase and frequency in audio 
systems according to the present invention; 

Figs. 3a and 3b show a preferred embodiment of a sinusoidal encoder 
25 component of the audio encoder of Fig. 1 ; 

Fig. 4 shows an audio player in which an embodiment of the invention is 
implemented; and 

Figs. 5a and 5b show a preferred embodiment of a sinusoidal synthesizer 
component of the audio player of Fig. 4; and 
30 Fig. 6 shows a system comprising an audio encoder and an audio player 

according to the invention. 
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Preferred embodiments of the invention will now be described with reference 
to the accompanying drawings wherein like components have been accorded like reference 
numerals and, unless otherwise stated, perform like functions. In a preferred embodiment of 
the present invention, the encoder 1 is a sinusoidal encoder of the type described in 
5 WO 01/69593, Fig. 1. The operation of this prior art encoder and its corresponding decoder 
has been well described and description is only provided here where relevant to the present 
invention. 

In both the prior art and the preferred embodiment of the present invention, the 
audio encoder 1 samples an input audio signal at a certain sampling frequency resulting in a 

10 digital representation x(t) of the audio signal. The encoder 1 then separates the sampled iiqmt 
signal into three components: transient signal components, sustained deterministic 
components, and sustained stochastic components. The audio encoder 1 comprises a transient 
encoder 11, a sinusoidal encoder 13 and a noise encoder 14. 

The transient encoder 1 1 comprises a transient detector (TO) 1 10, a transient 

15 analyzer(TA) 111 and a transient synthesizer (TS) 112.First, the signal x(t) enters the 
transient detector 1 10. This detector 1 10 estimates if there is a transient signal component 
and its position. This information is fed to the transient analyzer 1 1 1. If the position of a 
transient signal component is determined, the transient analyzer 111 tries to extract (the main 
part of) the transient signal component. It matches a shape function to a signal segment 

20 preferably starting at an estimated start position, and determines content underneath the shape 
function, by employing for example a (small) number of sinusoidal components. This 
information is contained in the transient code C T , and more detailed information on 
generating the transient code Cr is provided in WO 01/69593. 

The transient code Or is furnished to the transient synthesizer 112. The 

25 synthesized transient signal component is subtracted from the input signal x(t) in 

subtracter 16, resulting in a signal xl. A gain control mechanism GC (12) is used to 
produce x2 from xl . 

The signal x2 is furnished to the sinusoidal encoder 13 where it is analyzed in 
a sinusoidal analyzer (SA) 130, which determines the (deterministic) sinusoidal components. 

30 It will therefore be seen that while the presence of the transient analyzer is desirable, it is not 
necessary and the invention can be implemented without such an analyzer. Alternatively, as 
mentioned above, the invention can also be implemented with for example a harmonic 
complex analyzer. In brief, the sinusoidal encoder encodes the input signal x2 as tracks of 
sinusoidal components linked from one frame segment to the next. 
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Referring now to Fig. 3a, in the same manner as in the prior art, in the 
preferred embodiment, each segment of the input signal x2 is transformed into the frequency 
domain in a Fourier transform (FT) unit 40. For each segment, the FT unit provides measured 
amplitudes A, phases § and frequencies 0). As mentioned previously, the range of phases 
5 provided by the Fourier transform is restricted to -7c <> ty< 7C. A tracking algorithm (TA) 
unit 42 takes the information for each segment and by employing a suitable cost function, 
links sinusoids from one segment to the next, so producing a sequence of measured 
phases <p(k) and frequencies co(k) for each track. 

In contrast to the prior art, the sinusoidal codes C s ultimately produced by the 
10 analyzer 130 include phase information, and frequency is reconstructed from this information 
in the decoder. 

As mentioned above, however, the measured phase is wrapped, which means 
that it is restricted to a modulo 2n representation. Therefore, in the preferred embodiment, the 
analyzer comprises a phase unwrapper (PU) 44 where the modulo 2% phase representation is 

1 5 unwrapped to expose the structural inter-frame phase behavior \|/ for a track. As the 

frequency in sinusoidal tracks is nearly constant, it will be seen that the unwrapped phase \|/ 
will typically be a nearly linearly increasing (or decreasing) function and this makes cheap 
transmission of phase, i.e. with low bit rate, possible. The unwrapped phase y is provided as 
input to a phase encoder (PE) 46 which provides as output quantized representation levels r 

20 suitable for being transmitted 

Referring now to the operation of the phase unwrapper 44, as mentioned 
above, instantaneous phase \|/ and instantaneous frequency £2 for a track are related by: 

V(0 = £ t «fr)A+YCr 0 ) (1) 

where To is a reference time instant. 

25 A sinusoidal track in frames k = K, K+l . . . K+L- 1 has measured 

frequencies 0D(k) (expressed in radians per second) and measured phases <J)(k) (expressed in 
radians). The distance between the centers of the frames is given by U (update rate expressed 
in seconds). The measured frequencies are supposed to be samples of the assumed underlying 
continuous-time frequency track £2 with o(k) = £2(kU) and, similarly, the measured phases 

30 are samples of the associated continuous-time phase track \|/ with <p(k) = \|/(kU) mod (2tc). 
For sinusoidal encoding it is assumed that Q. is a nearly constant function. 
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Assuming that the frequencies are nearly constant within a segment Equation 1 
can be approximated as follows: 

v(^=J^W*+v((*-iX/) 

« R*) + ©(* - l)]C//2 +\|/ ((it - l)U) 

It will therefore be seen that knowing the phase and frequency for a given 
5 segment and the frequency of the next segment, it is possible to estimate an unwrapped phase 
value for the next segment, and so on for each segment in a track. 

In the preferred embodiment, the phase unwrapper determines an unwrap 
factor m(k) at time instant k: 

\j/ (kU) = <|>(*) + m(k)2n (3) 
10 The unwrap factor m(k) tells Ihe phase unwrapper 44 the number of cycles 

which has to be added to obtain Ihe unwrapped phase. 

Combining equations 2 and 3, the phase unwrapper determines an incremental 
unwrap factor e(k) as follows: 

2ice(k) = 2%{m(k) - m(k - 1)}= {©(*) + <o(fc - l)p/2 - {<{>(*) - <K* - 1)} 
1 5 where e should be an integer. However, due to measurement and model errors, 

the incremental unwrap factor will not be an integer exactly, so: 

e(k) = round([{(0(k) 1)]£//2 - -<K* - 1)11/(2*)) 
assuming that the model and measurement errors are small. 
Having the incremental unwrap fector e, the m(k) from equation (3) is 
20 calculated as the cumulative sum where, without loss of generality, the phase unwrapper 
starts in the first frame K with m(K) = 0, and from m(k) and <|>(k), the (unwrapped) 
phase \|/(£Z7) is determined. 

In practice, the sampled data \|/ (Art/) and Q.(kU) are distorted by 
measurement errors: 
25 ♦(*)=V(K0+c l (*) > 

where £i and e 2 are the phase and frequency errors, respectively, hi order to 
prevent the determination of the unwrap factor becoming ambiguous, the measurement data 
needs to be determined with sufficient accuracy. Thus, in the preferred embodiment, tracking 
30 is restricted so that: 

8(*) = e(Jc) - [{©(*) +a>(*-l)]l7/2 -{<)>(*) -<Sf(k - 1)}]/(2tc) < 8 0 
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where 8 is the error in the rounding operation. The error 8 is mainly 
determined by the errors in © due to the multiplication with U. Assume that co is determined 
from the maxima of the absolute value of the Fourier transform from a sampled version of the 
input signal with sampling frequency F 8 and that the resolution of the Fourier transform 
5 is 2rc/L a with L a the analysis size. In order to be within the considered bound, we have: 




That means that the analysis size should be few times larger than the update 
size in order for unwrapping to be accurate, e.g., setting 8o = 1/4, the analysis size should be 
four times the update size (neglecting the errors Ei in the phase measurement). 

10 The second precaution which can be taken to avoid decision errors in the 

round operation is to defining tracks appropriately. In the tracking unit 42, sinusoidal tracks 
are typically defined by considering amplitude and frequency differences. Additionally, it is 
also possible to account for phase information in the linking criterion. For instance, we can 
define the phase prediction error e as the difference between the measured value and the 

15 predicted value <jT according to 

where the predicted value can be taken as 
♦ (*) = <K* - 1) + Rfc) - <*>(* - 1))C7 1 2 

Thus, preferably the tracking unit 42 forbids tracks where e is larger than a 
20 certain value (e.g. e > tc/2), resulting in an unambiguous definition of e(k). 

Additionally, the encoder may calculate the phases and frequencies such as 
will be available in the decoder. If the phases or frequencies which will become available in 
the decoder differ too much from the phases and/or frequencies such as are present in the 
encoder, it may be decided to interrupt a track, i.e. to signal the end of a track and start a new 
25 one using the current frequency and phase and their linked sinusoidal data. 

The sampled unwrapped phase \|/(kU) produced by the phase unwrapper 
(PU) 44 is provided as input to phase encoder (PE) 46 to produce the set of representation 
levels r. Techniques for efficient transmission of a generally monotonically changing 
characteristic such as the unwrapped phase are known. In the preferred embodiment, Fig. 3b, 
30 Adaptive Differential Pulse Code Modulation (ADPCM) is employed. Here, a predictor 
(PF) 48 is used to estimate the phase of the next track segment and encode the difference 
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only in a quantizer (Q) 50. Since \y is expected to be a nearly linear function and for reasons 
of simplicity, the predictor 48 is chosen as a second-order filter of the form: 
y(k + \) = 2x(k)-x(k-l) 

where x is the input and y is the output It will be seen, however, that it is also 
5 possible to take other functional relations (including higher-order relations) and to include 
adaptive (backward or forward) adaptation of the filter coefficients. In the preferred 
embodiment, a backward adaptive control mechanism (QC) 52 is used for simplicity to 
control the quantized 50. Forward adaptive control is also possible as well but would require 
extra bit rate overhead. 

10 As will be seen, initialization of the encoder (and decoder) for a track starts 

with knowledge of the start phase <|>(0) and frequency co(0). These are quantized and 
transmitted by a separate mechanism. Additionally, the initial quantization step used in the 
quantization controller 52 of the encoder and the corresponding controller 62 in the decoder, 
Fig. 5b, is either transmitted or set to a certain value in both encoder and decoder. Finally, the 

1 5 end of a track can either be signaled in a separate side stream or as a unique symbol in the bit 
stream of the phases. 

The start frequency of the unwrapped phase is known, both in the encoder and 
in the decoder. On basis of this frequency, the quantization accuracy is chosen. For the 
unwrapped phase trajectories beginning with a low frequency, a more accurate quantization 

20 grid, i.e. a higher resolution, is chosen than for an unwrapped phase trajectory beginning with 
a higher frequency. 

In the ADPCM quantized, the unwrapped phase y\f (k) , where k represents the 
number in the track, is predicted/estimated from the preceding phases in the track. The 
difference between the predicted phase >j7 (k) and the unwrapped phase \|f (k) is then 
25 quantized and transmitted. The quantized is adapted for every unwrapped phase in the track. 
When the prediction error is small, the quantized limits the range of possible values and the 
quantization can become more accurate. On the other hand, when the prediction error is 
large, the quantized uses a coarser quantization. 

The quantized Q (in Fig. 3b) quantizes the prediction error A, which is 

30 calculated by 
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The prediction error A can be quantized using a look-up table. For this 
purpose, a table Q is maintained. For example, for a 2-bit ADPCM quantized, me initial table 
for Q may look like the table shown in Table 1. 



5 



Index j 


Lower boundaries bl 


Upper boundary bu 


0 


— oo 


-3.0 


1 


-3.0 


0 


2 


0 


3.0 


3 


3.0 


OO 



Table 1 : Quantization table Q used for first continuation. 



The quantization is done as follows. The prediction error A is compared to the 
boundaries b, such that the following equation is satisfied: 
bl t <A<,bu, 

From the value of i, that satisfies the above relation, the representation level r 
is computed by r = i. 

The associated representation levels are stored in representation table R, which 
is shown in Table 2. 



15 



20 



Representation level r 


Representation table R 


Level type 


0 


-3.0 


Outer level 


1 


-0.75 


Inner level 


2 


0.75 


Inner level 


rtble 2: Representation tat 


1^$ used for first continua 


iQjiter level 



The entries of tables Q and are multiplied by fector c for the quantization of 
the next sinusoidal component in the track. 
Q(k + l) = Q(k)c 
25 R(k+l) = R(k)c 
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During the decoding of a track, both tables are scaled according to the 
generated representation levels r. If r is either 1 or 2 (inner level) for the current sub-frame, 
then the scale factor c for the quantization table is set to c = 2~ 1/4 

Since c < 1 , the frequency and phase of the next sinusoid in a track becomes 
5 more accurate. If r is 0 or 3 (outer level), the scale factor is set to c = 2 1 ' 2 

Since c > 1, the quantization accuracy for the next sinusoid in a track 
decreases. Using these factors, one up-scaling can be made undone by two down-scalings. 
The difference in upscale and downscale factors results in a fast onset of an upscaling, 
whereas a corresponding downscaling requires two steps. 
10 In order to avoid very small or very large entries in the quantization table, the 

adaptation is only done if the absolute value of the inner level is between 7i/64 and 37c/4. In 
that case c is set to 1. 

In the decoder only table R has to be maintained to convert to received 
representation levels r to a quantized prediction error. This de-quantization operation is 
15 performed by block DQ in Fig. 5b. 

Using the above settings, the quality of the reconstructed sound needs 
improvement. In accordance with the invention, different initial tables for unwrapped phase 
tracks, depending on the start frequency, are used. Hereby a better sound quality is obtained. 
This is done as follows. The initial tables Q and Rare scaled on basis a first frequency of the 
20 track. In Table 3, the scale factors are given together with the frequency ranges. If the first 
frequency of a track lies in a certain frequency range, the appropriate scale fector is selected, 
and the tables R and Q are divided by that scale factor. The end-points can also depend on the 
first frequency of the track. In the decoder, a corresponding procedure is performed in order 
to start with the correct initial table R. 

25 



Frequency range 


Scale factor 


Initial table Q 


Initial table R 


0-500 Hz 


8 


-°o -0.190 0.19 oo 


-0.38-0.090.09 0.38 


500 -1000 Hz 


4 


-oo -0.37 0 0.37 oo 


-0.75-0.190.19 0.75 


1000 - 4000 Hz 


2 


-oo -0.75 0 0.75 oo 


-1.5-0.38 0.38 1.5 


4000 - 22050 Hz 


1 


-oo -1.5 0 1.5 oo 


-3 -0.75 0.75 3 


Table 3: Frequency dependent scale factors and initial 


. tables 
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Table 3 shows an example of frequency dependent scale factors and 
corresponding initial tables Q and R for a 2-bit ADPCM quantized. The audio frequency 
range 0-22050 Hz is divided into four frequency sub-ranges. It is seen that the phase accuracy 
is improved in the lower frequency ranges relative to the higher frequency ranges. 
5 The number of frequency sub-ranges and the frequency dependent scale 

factors may vary and can be chosen to fit the individual purpose and requirements. Like 
described above, the frequency dependent initial tables Q and R in table 3 may be up-scaled 
and down-scaled dynamically to adapt to the evolution in phase from one time segment to the 
next 

10 m e -g- a 3-bit ADPCM quantized, the initial boundaries of the eight 

quantization intervals defined by me 3 bits can be defined as follows: 

Q = {.oo -1.41 -0.707 -035 0 0.35 0.707 1.41 °°}, and can have minimum grid 
size 71/64, and a maximum grid size ju/2. The representation table R may look like: 

R= { -2.117, -1.0585, -0.5285, -0.1750, 0.1750, 0.5285, 1.0585, 2.117}. A 
15 similar frequency dependent initialization of the table Q and R as shown in Table 3 may be 
used in this case. 

From the sinusoidal code C s generated with the sinusoidal encoder, the 
sinusoidal signal component is reconstructed by a sinusoidal synthesizer (SS) 131 in the same 
manner as will be described for toe sinusoidal synthesizer (SS) 32 of the decoder. This signal 
20 is subtracted in subtracter 17 from the input x2 to the sinusoidal encoder 13, resulting in a 
remaining signal x3. The residual signal x3 produced by the sinusoidal encoder 13 is passed 
to the noise analyzer 14 of the preferred embodiment which produces a noise code C N 
representative of this noise, as described in, for example, international patent application 
No. PCT/EP00/04599. 

25 Finally, in a multiplexer 15, an audio stream AS is constituted which includes 

the codes C T , C s and C N . The audio stream AS is furnished to e.g. a data bus, an antenna 
system, a storage medium etc. 

Fig. 4 shows an audio player 3 suitable fer decoding an audio stream AS', e.g. 
generated by an encoder 1 of Fig. 1, obtained from a data bus, antenna system, storage 
30 medium etc. The audio stream AS' is de-multiplexed in a de-multiplexer 30 to obtain the 
codes C T , C s and C N . These codes are furnished to a transient synthesizer 31, a sinusoidal 
synthesizer 32 and a noise synthesizer 33 respectively. From the transient code Cp, the 
transient signal components are calculated in the transient synthesizer 3 1. In case the 
transient code indicates a shape function, the shape is calculated based on the received 
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parameters. Further, the shape content is calculated based on the frequencies and amplitudes 
of the sinusoidal components. If the transient code Gr indicates a step, then no transient is 
calculated The total transient signal y T is a sum of all transients. 

The sinusoidal code Cs including the information encoded by the analyzer 130 
is used by the sinusoidal synthesizer 32 to generate signal y s . Referring now to Figs. 5a and 
b, the sinusoidal synthesizer 32 comprises a phase decoder (PD) 56 compatible with the 
phase encoder 46. Here, a de-quantized (DQ) 60 in conjunction with a second-order 
prediction filter (PF) 64 produces (an estimate of) the unwrapped phase \|? from: the 

representation levels r; initial information <j> A (0) , (6(0) provided to the prediction filter 
(PF) 64 and the initial quantization step for the quantization controller (QC) 62. 

As illustrated in Fig. 2b, the frequency can be recovered from the unwrapped 
phase \|? by differentiation. Assuming that the phase error at the decoder is approximately 
white and since differentiation amplifies the high frequencies, the differentiation can be 
combined with a low-pass filter to reduce the noise and, thus, to obtain an accurate estimate 
of the frequency at the decoder. 

In the preferred embodiment, a filtering unit (FR) 58 approximates the 
differentiation which is necessary to obtain the frequency co from the unwrapped phase by 
procedures as forward, backward or central differences. This enables the decoder to produce 
as output the phases \|? and frequencies <8 usable in a conventional manner to synthesize the 
sinusoidal component of the encoded signal. 

At the same time, as the sinusoidal components of the signal are being 
synthesized, the noise code C N is fed to a noise synthesizer NS 33, which is mainly a filter, 
having a frequency response approximating the spectrum of the noise. The NS 33 generates 
reconstructed noise yisi by filtering a white noise signal with the noise code Cn. The total 
signal y(t) comprises the sum of the transient signal yr and the product of any amplitude 
decompression (g) and the sum of the sinusoidal signal ys and the noise signal yfo. The audio 
player comprises two adders 36 and 37 to sum respective signals. The total signal is furnished 
to an output unit 35, which is e.g. a speaker. 

Fig. 6 shows an audio system according to the invention comprising an audio 
encoder 1 as shown in Fig. 1 and an audio player 3 as shown in Fig. 4. Such a system offers 
playing and recording features. The audio stream AS is furnished from the audio encoder to 
the audio player over a communication channel 2, which may be a wireless connection, a 
data 20 bus or a storage medium. In case the communication channel 2 is a storage medium, 
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the storage medium may be fixed in me system or may also be a removable disc, memory 
stick etc. The communication channel 2 may be part of the audio system, but will however 
often be outside the audio system 

The coded data from several consecutive segments are linked. This is done as 
5 follows. For each segment a number of sinusoids are determined (for example using an FFT). 
A sinusoid consists of a frequency, amplitude and phase. The number of sinusoids is variable 
per segment. Once the sinusoids are determined for a segment, an analysis is done to connect 
to sinusoids from the previous segment. This is called linking' or 'tacking*. The analysis is 
based on the difference between a sinusoid of the current segment and all sinusoids from the 
10 previous segment. A link/tack is made with Ihe sinusoid in the previous segment that has the 
smallest difference. If even the smallest difference is larger than a certain threshold value, no 
connection to sinusoids of the previous segment is made. In this way a new sinusoid is 
created or "bom". 

The difference between sinusoids is determined using a 'cost function', which 
15 uses the frequency, amplitude and phase of the sinusoids. This analysis is performed for each 
segment The result is a large number of tracks for an audio signal. A tack has a birth, which 
is a sinusoid that has no connection with sinusoids from the previous segment. A birth 
sinusoid is encoded non-differentially. Sinusoids that are connected to sinusoids from 
previous segments are called continuations and they are encoded differentially with respect to 
20 the sinusoids from the previous segment. This saves a lot of bits, since only differences are 
encoded and not absolute values. 

If fln-1) is the frequency from a sinusoid from the previous segment and f(n) 
is a connected sinusoid from the current segment, then fin) - fin+l) is transmitted to the 
decoder. The number n represents the number in the tack, n=l is the birth, n = 2 is the first 
25 continuations etc. The same is true for the amplitudes. The phase value of the initial 

sinusoid (= birth sinusoid) is transmitted, whereas for a continuation, no phase is transmitted, 
but the phase can be retrieved from the frequencies. If a track has no continuation in the next 
segment, the track ends or "dies". 
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CLAIMS: 



1 • A method of encoding a signal, the method comprising the steps of: 

- providing a respective set of sampled signal values for each of a plurality of 
sequential segments; 

- analyzing the sampled signal values to determine one or more sinusoidal 
5 components for each of the plurality of sequential segments, each sinusoidal component 

including a frequency and a phase value; 

- linkin g sinusoidal components across a plurality of sequential segments to 
provide sinusoidal tracks; 

- determining, for each sinusoidal track in each of the plurality of sequential 
10 segments, a predicted phase value as a function of phase value for at least a previous 

segment; 

- determining, for each sinusoidal track, a measured phase value comprising a 
generally monotonically changing value; 

- quartering sinusoidal codes as a function of the predicted value for the phase 
15 and the measured phase for the segment where the sinusoidal codes are quintile in 

dependence on at least one frequency value of the respective sinusoidal track; and 

- generating an encoded signal including sinusoidal codes representing the 
frequency and the phase and linking information. 

20 2. A method according to claim 1 wherein in a first sinusoidal track including a 

first sinusoidal component with a first frequency value the sinusoidal codes are quintile using 
a first quantization accuracy, and in a second sinusoidal track including a second sinusoidal 
component with a second frequency value higher than the first frequency value, the 
sinusoidal codes are quintile using a second quantization accuracy lower than or equal to the 

25 first quantization accuracy. 

3 - A method according to claim 1 wherein the sinusoidal codes for a track 

include an initial phase and an initial frequency, and the predicting step employs the initial 
frequency and the initial phase to provide a first prediction. 
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4 - A method according to claim 1 wherein the phase value of each linked 

segment is determined as a function of: the integral of the frequency for the previous segment 
and the frequency of the linked segment; and the phase of a previous segment wherein the 
sinusoidal components include a phase value in the range {-%;%} . 

5. A method according to claim 1 wherein the quartering of the sinusoidal codes 

includes detennining a phase difference between each predicted phase value and the 
corresponding observed phase value; 

6 - A method according to claim 4 wherein the generating step comprises 
controlling the quantizing step as a function of the quantized sinusoidal codes. 

7- A method according to claim 6 wherein the sinusoidal codes include an 
1 5 indicator of the end of a track. 

8 - A method according to claim 1 wherein the method further comprises the steps 
of: 

- synthesizing the sinusoidal components using the sinusoidal codes; 

20 - subtracting the synthesized signal values from the sampled signal values to 

provide a set of values representing a remainder component of the audio signal; 

- modeling the remainder component of the audio signal by determining 
parameters, approximating the remainder component; and 

- including the parameters in an audio stream. 

25 

9 - A method according to claim 1 wherein the sampled signal values represent an 
audio signal from which transient components have been removed. 

10. A method of decoding an audio stream, the method comprising the steps of: 

30 " receiving an encoded signal including sinusoidal codes representing the 

frequency and the phase and linking information; 

- de-quartering sinusoidal codes thereby obtaining a unwrapped de-quintile 
phase, where the sinusoidal codes are de-quintile in dependence on at least one frequency 
value of the respective sinusoidal track; 



PHNL030921EPP 



17 18.07.2003 

- calculating the frequency from the de-quintile unwrapped phase values, and 

- employing said de-quintile frequency and phase to synthesize said sinusoidal 
components of said audio signal. 

5 11. A method according to claim 1 0 wherein in a first sinusoidal track including a 

first sinusoidal component with a first frequency value the sinusoidal codes are de-quintile 
using a first quantization accuracy, and in a second sinusoidal track including a second 
sinusoidal component with a second frequency value higher than the first frequency value, 
the sinusoidal codes are de-quintile using a second quantization accuracy lower than or equal 
10 to the first quantization accuracy. 

12. A method according to claim 10 wherein the sinusoidal codes for a track 
include an initial phase and an initial frequency, and the predicting step employs the initial 
frequency and the initial phase to provide a first prediction. 

15 

13. A method according to claim 10 wherein the phase value of each linked 
sinusoidal component is determined as a ftmction of: the integral of the frequency for the 
previous segment and the frequency of the linked segment; the phase of a previous segment, 
and wherein the sinusoidal components include a phase value in the range {-%;%} . 

20 

14. A method according to claim 13 wherein the generating step comprises: 
controlling the quantizing accuracy as a function of the quantized sinusoidal codes. 

15. Audio encoder arranged to process a respective set of sampled signal values 
25 for each of a plurality of sequential segments, said coder comprising; 

- an analyzer for analyzing the sampled signal values to determine one or more 
sinusoidal components for each of the plurality of sequential segments, each sinusoidal 
component including a frequency and a phase value; 

- a linker for linking sinusoidal components across a plurality of sequential 
30 segments to provide sinusoidal tracks; 

- a phase unwrapper for determining, for each sinusoidal track in each of the 
plurality of sequential segments, a predicted phase value as a function of phase value for at 
least a previous segment and for determining, for each sinusoidal track, a measured phase 
value comprising a generally monotonically changing value; 
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- a de-quantizer for de-quartering sinusoidal codes as a function of the 
predicted value for the phase and the measured phase for the segment where the sinusoidal 
codes are de-quintile in dependence on at least one frequency value of the respective 
sinusoidal track; and 

- means for providing a encoded signal including sinusoidal codes 
representing the frequency and the phase. 



16- Audio player comprising: 

■ means for reading an encoded audio signal including sinusoidal codes 
10 representing a frequency and a phase for each track of linked sinusoidal components, 

- a de-quantizer for generating phase values and for generating frequency 
values from said phase values; and 

- a synthesizer arranged to employ said generated phase and frequency values 
to synthesize said sinusoidal components of said audio signal. 



15 



17. Audio system comprising an audio encoder as claimed in claim 1 5 and an 

audio player as claimed in claim 1 6. 



18. Audio stream comprising sinusoidal codes representing tracks of linked 

20 sinusoidal components across a plurality of sequential segments of an audio signal, said 

codes representing a predicted phase value as a function of phase value for at least a previous 
segment a measured phase value comprising a generally monotonically changing value. 



25 



19. 

stored. 



Storage medium on which an audio stream as claimed in claim 18 has been 
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ABSTRACT: 



In a sinusoidal audio encoder a number of sinusoids are estimated per audio 
segment A sinusoid is represented by frequency, amplitude and phase. Normally, phase is 
quantized independent of frequency. The invention uses a frequency dependent quantization 
of phase, and in particular the low frequencies are quantized using smaller quantization 
5 intervals than at higher frequencies. Thus, the unwrapped phases of the lower frequencies are 
quantized more accurately, possibly with a smaller quantization range, than the phases of the 
higher frequencies. The invention gives a significant improvement in decoded signal quality, 
especially for low bit-rate quantizers. 



10 Fig. 3a 
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